Failure Resilient Heterogeneous Parallel Computing Across Multidomain Clusters

نویسندگان

  • Dawid Kurzyniec
  • Vaidy S. Sunderam
چکیده

We propose lightweight middleware solutions that facilitate and simplify the execution of failure-resilient MPI programs across multidomain clusters. The system described in this paper leverages H2O, a distributed metacomputing framework, to route MPI message passing across heterogeneous aggregates located in different administrative or network domains. MPI programs instantiate a specially written H2O pluglet; messages that are destined for remote sites are intercepted and transparently forwarded to their final destinations. We demonstrate that the proposed technique is indeed effective in enabling communication by MPI programs across distinct clusters and across firewalls. Only minimally lowered performance was observed in our tests, and we believe the substantially increased functionality would compensate for this overhead in most situations. In addition to enabling multi-cluster communications, we note that with the increasing size and distribution of metacomputing environments, fault tolerance aspects become critically important. We argue that the fault tolerance model proposed by FT-MPI fits well in geographically distributed environments, even though its current implementation is confined to a single administrative domain. We describe extensions to overcome these limitations by combining FT-MPI with the H2O framework. Our approach allows users to run fault tolerant MPI programs on heterogeneous, geographically distributed shared machines, without sacrificing performance and with minimal involvement of resource providers.

منابع مشابه

Parallel computing using MPI and OpenMP on self-configured platform, UMZHPC.

Parallel computing is a topic of interest for a broad scientific community since it facilitates many time-consuming algorithms in different application domains.In this paper, we introduce a novel platform for parallel computing by using MPI and OpenMP programming languages based on set of networked PCs. UMZHPC is a free Linux-based parallel computing infrastructure that has been developed to cr...

متن کامل

Experiences with Asynchronous Communication Models in VEOS, a Distributed Programming Facility for Uniprocessor LANs

Like conventional multiprocessors, workstation clusters can provide data sharing and parallel computing. But unlike multiprocessors, these clusters provide flexible connectivity and can tolerate heterogeneous processing elements. Uniprocessor LANs are a common choice for cost-effective computing. The workstation nodes typically run a version of Unix and support common Unix services such as reli...

متن کامل

Performance Evaluation of Static and Dynamic Load Balancing Schemes for a Parallel Computational Fluid Dynamics Software (CFD) Application (FLUENT) Distributed across Clusters of Heterogeneous Symmetric Multiprocessor Systems

Computational Fluid Dynamics (CFD) applications are “highly parallelizable” and can be distributed across a cluster of computers. However, because computation time can vary with the distributed part (mesh), the system loads are unpredictable and processors can have widely different computation speeds. Load balancing (and thus computational efficiency) across a heterogeneous cluster of processor...

متن کامل

Risk-Tolerant Heuristic Scheduling for Trusted Grid Computing on Realistic Platforms

Risk-Tolerant Heuristic Scheduling for Trusted Grid Computing on Realistic Platforms Shanshan Song, Student Member, IEEE, Yu-Kwong Kwok, Senior Member, IEEE, and Kai Hwang, Fellow, IEEE Abstract: Realistic platforms for Grid computing face security threats from the network attacks. Heterogeneous clusters in the open Grid are likely working in different autonomous domains (ADs). Grid jobs dispat...

متن کامل

HeteroPBLAS: A Set of Parallel Basic Linear Algebra Subprograms Optimized for Heterogeneous Computational Clusters

This paper presents a software library, called Heterogeneous PBLAS (HeteroPBLAS), which provides optimized parallel basic linear algebra subprograms for Heterogeneous Computational Clusters. This library is written on the top of HeteroMPI and PBLAS whose building blocks, the de facto standard kernels for matrix and vector operations (BLAS) and message passing communication (BLACS), are optimize...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

متن کامل
عنوان ژورنال:
  • IJHPCA

دوره 19  شماره 

صفحات  -

تاریخ انتشار 2005